Your task in this exercise is to create an R Markdown document that reports an analysis based on the (synthetic) data from the GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany. The output should be an HTML document. For the R Markdown output (and potentially also other types of output you may produce), we suggest that you create a separate folder in your project directory named output.
In the R Markdown documents we will create in this session and the one on Advanced R Markdown & LaTeX, we will use a few packages in addition to the tidyverse. For creating R Markdown documents, you need the rmarkdown package. You can check whether you have a certain package installed via the Packages tab in the RStudio GUI. Alternatively, you can also run the following command (in the R console): "rmarkdown" %in% rownames(installed.packages()). We will also need the following packages: knitr(should have been installed with RStudio), corrr for computing correlations, and stargazer for creating tables. As a reminder, you can install multiple packages via the command install.packages(c("corrr", "stargazer")).
Please note that the analyses we propose here are only toy examples. If you are or feel familiar enough with R and R Markdown and are interested in exploring other questions with the data, feel free to do something else in the R Markdown documents (e.g., use other variables and/or analysis methods). Also feel free to adjust/change anything else in the solution code (e.g., the document title, formatting, structure/headings, etc.).
We have created a sample solution for the R Markdown report that you should create in this exercise. You can find the .Rmd file as well as the HTML output in the solutions folder within the workshop materials: Report1_Obeying_Curfew.Rmd and Report1_Obeying_Curfew.html.
As a general note: The code shown in the solution fields for this set of exercises needs to be included in code chunks in the R Markdown document for which you may also need (or want) to specify (extra) chunk options.
That being said, let’s get started with creating the reproducible R Markdown report. In this report we want to look at factors predicting willingness to comply with curfew measures during early phases of the COVID-19 pandemic in Germany.
R Markdown file in the src folder in your project directory. The output type should be HTML. Also specify a meaningful title and a subtitle, add an author name, and a date in the YAML header.
R Markdown document via the File tab in the RStudio menu: File -> New File -> R Markdown. Make sure to select HTML as the output format. You can also specify a title and add your name into the Author field in the GUI menu. You can manually specify a date in the YAML header or insert (inline) R code for automatically pasting the current date.
If you want to, you can also make your HTML output easier to use and also prettier by specifying some additional options in the YAML header. To increase the usability of the resulting HTML document, you can add a table of contents (TOC) with 2 levels, make the TOC float (i.e., move when you scroll through the document), number the sections (as specified by the headers), hide the code by default, and include a button for downloading the underlying .Rmd document.
html_document specification. Have a look at the lecture slides for an example of how to do this.
knitr code chunk options. Make sure that the code is displayed in the output document, but that messages are suppressed.
HTML document. You can achieve this by setting the chunk option include to FALSE for the setup chunk.
R script we have created in the previous session.
data_wrangling_gpc.R and stored in the src folder within your project repository) or you can call the script from within the R Markdown document. The command for executing an R script from another R script or an R Markdown document is source() which takes as its first argument a string containing the path to the file.
Now that we have set up the document, let’s start generating some content for it. We propose the following structure (but, again, feel free to change/adapt this): Research question, Methods with the two subsections Sample and Measure, Results with the three subsections Descriptive statistics, Correlations, and Regression analysis, and a final Discussion section.
You can already add some text to the Research question section.Markdown conventions for specifying different header levels using the # symbol.
R code).
risk_self, risk_surroundings, and risk_infect_others, and the outcome/target variable is obey_curfew.
risk_ variables using the stargazer package. For all analyses, we want to use the data from the reduced/filtered data set (which should be called corona_survey_noncrit).
dplyr::select(). Remember that the stargazer() function does not work with tibbles and expects a dataframe. Since we want to create HTML output, you should specify “html” as the type for the stargazer table. For prettier results, you may also want to give the table a title, specify that you want two decimal places, and also change the variable names (not in the data set but in the pipe for generating the table). To have the table properly displayed in the output document, you need to set the chunk option results = 'asis'.
corrr package (for computing the correlations) and combine it with a function from the knitr package for creating the table.
corrr package is its correlate() function. The knitr function for creating tabular output is kable(). To make the output nicer to read, we can use two additional functions from the corr package for removing double entries (i.e., the upper triangle) from the correlation matrix and cleanly formatting it. You can consult corr package documentation to find out how to do this. In addition, you might want to add a caption and change the columns names via two arguments in the knitr::kable() function.
obey_curfew variable as the outcome. Let’s first compute this simple model before we produce a table to present its results.
R yet, a simple web search should quickly turn up many useful results that can show you how do to this. As a hint or reminder, you need the glm() function which expects the first argument (i.e., the regression formula) in the same format as the lm() function. However, in addition to that, you need to specify that you want to use a logit link (via the family argument).
stargazer package for this in a similar way as for the summary stats table.
stargazer() function expects one or more model objects. For a prettier table, you can also specify labels for the dependent variables and the predictors. Have a look at the help file for the stargazer() function via ?stargazer() to see how to do this. Also here, for correct printing in the output document you need to set the chunk option results = 'asis'.
R session (OS, R version, loaded packages) to the document. Add another section at the end of the document of the document called Reproducibility information and display this information there.
base R function to access info about your session (function name hint 1) and another one for printing it (function name hint 2). You can exclude information about your locale settings there.
R Markdown document, let’s knit the HTML output. You should save the resulting .html file in the output folder within your project repository.
output folder (by default, output documents are stored in the same location as the source file). Alternatively, you can use the render() command from the rmarkdown package and provide it with the name/location of the file you want to knit and a path for the output file. For the second option, make sure that your working directory is set appropriately (or adjust the file paths in the solution accordingly).
Finally, as before, after you have created and saved the .Rmd document and the HTML output, in Git, add the files, commit them (with a meaningful commit message), and push the changes to your GitHub repository.